research scientist
Google DeepMind wants to know if chatbots are just virtue signaling
Google DeepMind is calling for the moral behavior of large language models--such as what they do when called on to act as companions, therapists, medical advisors, and so on--to be scrutinized with the same kind of rigor as their ability to code or do math . As LLMs improve, people are asking them to play more and more sensitive roles in their lives. Agents are starting to take actions on people's behalf. LLMs may be able to influence human decision-making . And yet nobody knows how trustworthy this technology really is at such tasks. With coding and math, you have clear-cut, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in today. That's not the case for moral questions, which typically have a range of acceptable answers: "Morality is an important capability but hard to evaluate," says Isaac. "In the moral domain, there's no right and wrong," adds Haas.
- North America > United States > Ohio (0.05)
- North America > United States > Massachusetts (0.05)
- Europe > Germany > Saarland (0.05)
How social media encourages the worst of AI boosterism
The era of hype first, think later. Demis Hassabis, CEO of Google DeepMind, summed it up in three words: "This is embarrassing." Hassabis was replying on X to an overexcited post by Sébastien Bubeck, a research scientist at the rival firm OpenAI, announcing that two mathematicians had used OpenAI's latest large language model, GPT-5, to find solutions to 10 unsolved problems in mathematics. "Science acceleration via AI has officially begun," Bubeck crowed. Put your math hats on for a minute, and let's take a look at what this beef from mid-October was about. Bubeck was excited that GPT-5 seemed to have somehow solved a number of puzzles known as Erdős problems.
- North America > United States > Massachusetts (0.05)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
- Asia > China (0.05)
OpenAI's new LLM exposes the secrets of how AI really works
The experimental model won't compete with the biggest and best, but it could tell us why they behave in weird ways--and how trustworthy they really are. ChatGPT maker OpenAI has built an experimental large language model that is far easier to understand than typical models. That's a big deal, because today's LLMs are black boxes: Nobody fully understands how they do what they do. Building a model that is more transparent sheds light on how LLMs work in general, helping researchers figure out why models hallucinate, why they go off the rails, and just how far we should trust them with critical tasks. "As these AI systems get more powerful, they're going to get integrated more and more into very important domains," Leo Gao, a research scientist at OpenAI, told in an exclusive preview of the new work. "It's very important to make sure they're safe."
- Asia > India (0.05)
- North America > United States > Massachusetts (0.05)
The AI Industry's Scaling Obsession Is Headed for a Cliff
The AI Industry's Scaling Obsession Is Headed for a Cliff Huge AI infrastructure deals assume that algorithms will keep improving with scale. A new study from MIT suggests the biggest and most computationally intensive AI models may soon offer diminishing returns compared to smaller models. By mapping scaling laws against continued improvements in model efficiency, the researchers found that it could become harder to wring leaps in performance from giant models whereas efficiency gains could make models running on more modest hardware increasingly capable over the next decade. "In the next five to 10 years, things are very likely to start narrowing," says Neil Thompson, a computer scientist and professor at MIT involved in the study. Leaps in efficiency, like those seen with DeepSeek's remarkably low-cost model in January, have already served as a reality check for the AI industry, which is accustomed to burning massive amounts of compute.
Tesla Is Urging Drowsy Drivers to Use 'Full Self-Driving'. That Could Go Very Wrong
Tesla Is Urging Drowsy Drivers to Use'Full Self-Driving'. Experts say that advising customers to switch in on when they're drifting between lanes is exactly the wrong move. Since Tesla launched its Full Self-Driving (FSD) feature in beta in 2020, the company's owner's manual has been clear: Contrary to the name, cars using the feature can't drive themselves. Tesla's driver assistance system is built to handle plenty of road situations--stopping at stop lights, changing lanes, steering, braking, turning. Still, "Full Self-Driving (Supervised) requires you to pay attention to the road and be ready to take over at all times," the manual states.
- North America > United States > California (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
We Need a New Ethics for a World of AI Agents
Gabriel, Iason, Keeling, Geoff, Manzini, Arianna, Evans, James
We need a new ethics for a world of AI agents . The deployment of capable AI agents raises fresh questions about safety, human-machine relationships and social coordination. Ar tificial intelligence (AI) developers are shifting their focus to building agents that can operate independently, with little human inter vention. To be an agent is to have the ability to perceive and act on an environment in a goal-directed and autonomous way. For example, a dig ital agent could be programmed to browse the web and make online purchases on behalf of a user -- comparing prices, selecting items and completing checkouts.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada (0.14)
- Africa > Eswatini > Manzini > Manzini (0.07)
- (3 more...)
Elon Musk brags he lured Meta's top stars away despite jaw-dropping offers to stay
Elon Musk has raided Meta's collection of talented researchers, despite Mark Zuckerberg reportedly offering some a fortune to choose his company instead. The workers were part of Zuckerberg's AI team, helping Meta in the global race to build superintelligence, an almost godlike form of artificial intelligence that could think for itself and be much smarter than any human. Musk himself has gloated about the departures, posting on X that'many strong Meta engineers have and are joining xAI and without the need for insane initial [compensation].' At least 14 Meta researchers and engineers have left for their new home at Musk's AI competitor since January, while others have fled to OpenAI, the creator of ChatGPT. A spokesperson for Meta told the Daily Mail: 'Some attrition is normal for any organization of this size.'
- Information Technology > Services (0.38)
- Law > Litigation (0.30)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
Landau, Gilad, Özdogan, Miran, Elvers, Gereon, Mantegna, Francesco, Somaiya, Pratik, Jayalath, Dulhan, Kurth, Luisa, Kwon, Teyun, Shillingford, Brendan, Farquhar, Greg, Jiang, Minqi, Jerbi, Karim, Abdelhedi, Hamza, Ramos, Yorguin Mantilla, Gulcehre, Caglar, Woolrich, Mark, Voets, Natalie, Jones, Oiwi Parker
The advance of speech decoding from non-invasive brain data holds the potential for profound societal impact. Among its most promising applications is the restoration of communication to paralysed individuals affected by speech deficits such as dysarthria, without the need for high-risk surgical interventions. The ultimate aim of the 2025 PNPL competition is to produce the conditions for an "ImageNet moment" or breakthrough in non-invasive neural decoding, by harnessing the collective power of the machine learning community. To facilitate this vision we present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl) for easy data access and integration with deep learning frameworks. For the competition we define two foundational tasks (i.e. Speech Detection and Phoneme Classification from brain data), complete with standardised data splits and evaluation metrics, illustrative benchmark models, online tutorial code, a community discussion board, and public leaderboard for submissions. To promote accessibility and participation the competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing, accelerating progress toward a non-invasive brain-computer interface for speech.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (4 more...)
Andrew Barto and Richard Sutton win 2024 Turing Award
The Association for Computing Machinery, has named Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award. The pair have received the honour for "developing the conceptual and algorithmic foundations of reinforcement learning". In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning. The Turing Award comes with a 1 million prize, to be split between the recipients. Since its inception in 1966, the award has honoured computer scientists and engineers on a yearly basis.
- North America > Canada > Alberta (0.28)
- North America > United States > Texas > Dallas County > Dallas (0.07)
- North America > United States > New Jersey (0.07)
- (2 more...)
Towards a Realistic Long-Term Benchmark for Open-Web Research Agents
Mühlbacher, Peter, Bosse, Nikos I., Phillips, Lawrence
We present initial results of a forthcoming benchmark for evaluating LLM agents on white-collar tasks of economic value. We evaluate agents on real-world "messy" open-web research tasks of the type that are routine in finance and consulting. In doing so, we lay the groundwork for an LLM agent evaluation suite where good performance directly corresponds to a large economic and societal impact. We built and tested several agent architectures with o1-preview, GPT-4o, Claude-3.5 Sonnet, Llama 3.1 (405b), and GPT-4o-mini. On average, LLM agents powered by Claude-3.5 Sonnet and o1-preview substantially outperformed agents using GPT-4o, with agents based on Llama 3.1 (405b) and GPT-4o-mini lagging noticeably behind. Across LLMs, a ReAct architecture with the ability to delegate subtasks to subagents performed best. In addition to quantitative evaluations, we qualitatively assessed the performance of the LLM agents by inspecting their traces and reflecting on their observations. Our evaluation represents the first in-depth assessment of agents' abilities to conduct challenging, economically valuable analyst-style research on the real open web.
- Asia > Russia (0.93)
- Asia > China (0.70)
- North America > United States (0.28)
- (3 more...)
- Information Technology > Services (1.00)
- Government > Military (1.00)
- Banking & Finance > Trading (1.00)
- (7 more...)